Integrated dimensionality reduction technique for mixed-type data involving categorical values
نویسندگان
چکیده
An extension to the recent dimensionality-reduction technique t-SNE is proposed. The extension facilitates t-SNE to handle mixed-type datasets. Each attribute of the data is associated with a distance hierarchy which allows the distance between numeric values and between categorical values be measured in a unified manner. More importantly, domain knowledge regarding semantic distance between categorical values can be specified in the hierarchy. Consequently, the extended t-SNE can reflect topological order of the high-dimensional, mixed data in the low-dimensional space.
منابع مشابه
SpectralCAT: Categorical spectral clustering of numerical and nominal data
Data clustering is a common technique for data analysis, which is used in many fields, including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. Although many clustering algorithms have been proposed, most of them deal with clustering of one data type (numerical or nominal) or with mix data type (numerical and nominal) and only few o...
متن کاملVisual Analysis of Multi-Dimensional Categorical Data Sets
We present a set of interactive techniques for the visual analysis of multi-dimensional categorical data. Our approach is based on multiple correspondence analysis (MCA), which allows one to analyse relationships, patterns, trends and outliers among dependent categorical variables. We use MCA as a dimensionality reduction technique to project both observations and their attributes in the same 2...
متن کاملInterpretable Dimensionality Reduction for Classification with Functional Data
Classification problems involving a categorical class label Y and a functional predictor X(t) are becoming increasingly common. Since X(t) is essentially infinite dimensional some form of dimensionality reduction is essential in these problems. Conventional data reduction techniques for functional data can be categorized into functional principal component analysis and filtering methods. Howeve...
متن کاملHIMIC : A Hierarchical Mixed Type Data Clustering Algorithm
Clustering is an important data mining technique. There are many algorithms that cluster either numeric or categorical data. However few algorithms cluster mixed type datasets with both numerical and categorical attributes. In this paper, we propose a similarity measure between two clusters that enables hierarchical clustering of data with numerical and categorical attributes. This similarity m...
متن کاملClustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this paper, we propose a novel divide-and-conquer techniq...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Appl. Soft Comput.
دوره 43 شماره
صفحات -
تاریخ انتشار 2016